perm filename MEMO[7,ALS] blob sn#030925 filedate 1973-03-28 generic text, type T, neo UTF8
00010		This report discribes an extention and modification to the
00020	signature table method, as discribed by Samuel[1] to adapt it for use
00030	in speech recognition.
00040		While the original scheme can be used in this more demanding
00050	situation with a moderate degree of success, it seemed desirable
00060	to adopt a somewhat more rigorous procedure for at least two compelling
00070	reasons. In the first place the input parameters, for the case of speech
00080	appear to be less independent of each other than for the
00090	game of checkers. The use of separate tables to obtain correlation
00100	coefficients for subsets of the input parameters and the
00110	subsequent combination of different subsets by higher level tables
00120	is therefore open to question. A second difficulty has to do with the
00130	multicategory classification that is involved in the case of speech
00140	where it is essential for the outputs of higher tables to have a
00150	clear interpretation, in some absolute sense, so that they may be
00160	compared, one with the other,- something that was not required in the
00170	earlier situation where there was but one final output. In the multi-
00180	category case the outputs correlation coefficients or at least factors
00190	from which such coefficients can be computed rather than being
00200	correlations of the lower level correlation coefficients.
00210		The object of the present report is to:
00220	a) Investigate the conditions under which the decomposition of a
00230	multidimensional space into two or more levels is valid, and
00240	b) Devise a system of signature tables in which the outputs represent
00250	true probability densities for each training class and in which the
00260	values stored in these tables can be sequentially accumulated during
00270	a training phase.
00280	
00290		For those not familiar with the use of signature tables as
00300	used by Samuel in programs which played the game of checkers, the
00310	concept is best illustrated (Fig.1) by an arrangement of tables
00320	used in the program. There are 27 input terms. Each term evaluates
00330	a specific aspect of a board situation and it is quantized into
00340	a limited but adequate range of values, 7,5,and 3, in this case.
00350	The terms are divided into 9 sets with 3 terms each, forming the 9
00360	first level tables. Outputs from the first level tables are quantized
00370	to 5 levels and combined into 3 second level tables and, finally,
00380	into one third-level table whose output represents the figure of
00390	merit of the board in question.
00400		A signature table has an entry for every possible combination
00410	of the input vector. Thus there are 7*5*3 or 105 entries in each of
00420	the first level tables. Training consists of accumulating two counts
00430	for each entry during a training sequence. Count A is incremented
00440	when the current input vector represents a prefered move and count D
00450	is incremented when it is not the prefered move. The output from the
00460	table is computed as a correlation coeficient
00470				C=(A-D)/(A+D)
00480	The figure of merit for a board is simply the coefficient obtained
00490	as the output from the final table.
00500	
00510		The following three advantages emerge from this method of
00520	training and evaluation.
00530		1) Essentially arbitrary inter-relationships between the input
00540	terms are taken in account by any one table. The only loss of
00550	accuracy is in the quantization.
00560		2) The training is a very simple process of accumulating
00570	counts. The training samples are introduced sequentially, and hence
00580	simultaneous storage of all the samples is not required.
00590		3) The process linearizes the storage requirements in the
00600	parameter space. In the case shown this requires only 343 entries
00610	instead of the 105↑9 entries were the entire space to be represented.
00620	
00630		Before investigating the conditions under which the decomposition
00640	of a multidimensional space is valid it will simplify the explanation
00650	if we first consider a specific example in qualitative terms only.
00660	
00670		Consider the simple table arrangement as shown in Fig.2 where
00680	we wish to take account of 5 input parameters, each requiring 3 bits
00690	for its specification. Were we to do all this in one table it would
00700	require 2↑15 or 32,768 entries, instead of the 1024 entries required
00710	for the two level arrangement shown. What we require as the output
00720	from the first level table is some function which represents the
00730	contribution made by the inputs to the first table so that the output
00740	from the second table truely represents the conditional probability
00750	that some specific training class has been represented by the specific
00760	inputs. In accordance with the usual practice we want to determine
00770			P(CLASS|(A,B,C,D,E) )
00780	where A B C D and E represent a specific set of input parameters. 
00790	Unfortunately this cannot be broken into two parts in any very simple
00800	way since the values of A B C D and E are not unrelated. During the
00810	training sequence we are given specific values from which it is
00820	easy to compute the inverse probability
00830			P( (A,B,C,D,E)|CLASS)
00840	and we can equally well compute two separate probabilities
00850			P( (A,B,C)|CLASS) and
00860			P( (D,E)|CLASS).
00870	What we want is
00880			P( (A,B,C)|(D,E,CLASS) )
00890	since we do not have space to store the simultaneous probability
00900	of all of the input parameters and we can compute this value from
00910		P( (A,B,C,D,E)|CLASS )=P( (A,B,C)|D,E,CLASS )*P( (D,E)|CLASS ).
00920	If we now focus our attention on the second table  we will observe
00930	that the two outputs D and E partition this table into sections.
00940	In the expanded case where all entries were into one table there
00950	should be enough entries in these sections to allow for all possible
00960	combinations of values assignable to the variables A B and C. However
00970	since we are not finally interested in which particular points in
00980	A,B,C space are associated with each region but with the probabilities
00990	we need only allow enough space to represent these probabilities
01000	to the desired accuracy.  The outputs stored in the first level
01010	table corresponding to those A,B,C values to be grouped togather
01020	must therefore contain the same value. What we are 
01030	doing is to reduce the dimensionallity of the space represented by
01040	the second level table by 2 by grouping togather those points in
01050	A,B,C space which give the same overall probability, given the values
01060	of D and E which define the region. In order to be able to do this
01070	we must associate with each point in A,B,C space not only the
01080	value of P( (A,B,C)|CLASS ) but also a value which contains contributions
01090	made by all those combinations of D and E which are found to be
01100	associated with the values of A B and C which define this point.
01110	The rigorous solution to this problem proves to be extremely awkward
01120	to effect, but fortunately there is a relatively simple approximation
01130	which has been found to yield useful results.
01140		We can discribe the process as follows. Assuming that some
01150	training has already taken place and the tables have started to
01160	stablize. Whenever an input is encountered which belongs to the class
01170	defined by a particular table array the values of A B C D and
01180	E are determined. A count of 1 is added to the first field in the first
01190	level table defined by A B and C. The output value fron the third field
01200	of this table is read. It should be noted that this output has been
01210	computed on the basis of previous training. This output togather
01220	with the values of D and E locate an entry in the second level table
01230	and a count of 1 is added to the first field of the second
01240	level table as defined by this entry. A correction is then made
01250	to a second field of the first level table which depends upon the
01260	concurrent values of D and E as computed by a method yet to be defined.
01270	The values stored in the last or output column of this first level
01280	table are all recomputed to allow for this correction. This calculation
01290	involves a count of the total number of times this particular class
01300	has been encountered during the entire learning sequence and the total
01310	number of learning samples, so these counts must also be updated.
01320	Finally the final output figure in the last field of the second 
01330	table is also recomputed. A discription of exact details of this
01340	updating process will be defined until after the detailed analysis
01350	which is given in the next section.
01360	
01370	
01380	
01390	
01400	caption for FIG.2
01410	
01420	FIG. 2 Arrangement during training (simplified arrangement)
01430		Inputs A B and C are concatenated to form an address which
01440	points to an entry into table 1 consisting of several fields or
01450	columns. Counts are accumulated in column (a) of the number of times
01460	the class is found to coincide with the particular values of A B and
01470	C. An accumulation is made in column (b) of a particular function of
01480	inputs D and E which also coincide with the values of A B and C which
01490	locate the entry. Data in column (c) are computed on the basis of the
01500	data in columns (a) and (b) as explained elsewhere.
01510		Inputs f(A,B,C,D,E,) taken from column (c) of table 1 togather
01520	with D and E are similarly concatinated to form an address of an entry
01530	into table 2. Counts are accumulated in column (d) of the number of
01540	times the class is found to coincide with the values specifying the
01550	entry. Finally column (e) is computed as explained elsewhere and contains
01560	a reasonable approximation to the desired values of
01570			p( class|(A,B,C,D,E) ).